We combined data from several datasets:
This dataset contains daily updates from 56 states and territories on numbers of positive and negative COVID-19 tests and deaths from March 4 to current date (in total of 33 timepoints). Since 2020/03/16, every day 56 rows, each representing a US state or territory, are added to the existing dataset.
This dataset is published by EduWeek.com, and regards status over time of school closings by district and then by state.
This dataset is published by The Henry J. Kaiser Family Foundation and summarizes state-wide government for each US state.
This dataset is published by the US Census Bureau. It contains the estimation of population by state for the year of 2019.
The first six rows of our merged dataset appears as follows:
Our merged dataset has the following columns and attributes:
state [chr]: State abbreviation
date [date]: Date of the data collected
positive [numeric]: Number of total tests that resulted in positive results reported
negative [numeric]: Number of total tests that resulted in negative results reported
total [numeric]: Number of total conducted tests reported
death [numeric]: Number of total deaths reported
Governor.Political.Affiliation[chr]: Party affiliation of the state governor
StateClosureStartDate [date]: Date state enforced public school closure
Region [chr]: The region of the US the state is located in
POPESTIMATE2019 [numeric]: The state's estimated population
ClosureDateCat [chr]: The tertile the state's school closure date falls in. The first third of states to close are placed in the "Early" category, the next third in the "Middle" category and the last in the "Late".
Figure 1: The number of rows recorded daily, where each row in each date corresponds to a state. We use data that has all state data for each day which starts from March 16 to the last time we pulled the data.
Figure 2: The number of states closing public schools for each date
Figure 3: The cumulative number of tests, positive and negative results, and death daily
State Historical Data in this COVID Tracking Project
Data Source: statewide public health departments. https://covidtracking.com/data
Sources of bias:
Selection bias introduced by access to tests: number of cases is partially a function of number of tests performed; due to the limited supplies and strict qualifications, there is a huge disparity in terms of who can get tests.
Due to the rapidly evolving nature of the pandemic, testing facilities were opened and utilized at different times. “As of 14 March 2020, public health laboratories using the CDC assay are no longer required by FDA to submit samples to CDC for confirmation. CDC is maintaining surge capacity while focusing on other support to state public health and on improving options for diagnostics for use in the public health”. Protocols across different testing facilities could vary, which introduces another form of bias, even within the same state. Note that in the case of Massachusetts, numbers included repeat testing between CDC confirmed cases and tests sent to US Public Health Labs.
Some states report numbers of people hospitalized, in ICU, and on ventilators whereas some do not, as state-level surveillance efforts on these metrics vary drastically. The COVID Tracking Project website strongly cautions against comparing these metrics between states.
State-wide school closure enforcement date from Education Week
Data sources: Local news reporting; National Center for Education Statistics; school/district websites; government websites; clicking through suggests strong reliance on state-level announcements
Sources of bias:
Using state-wide data may not reflect earlier (potentially effective) actions taken at district levels
May not represent private schools who enacted closures more quietly.
Does not reflect schools where parents pulled students prior to formal school closings, so there may be some trends at local levels attributable to those actions, prior to the formal state-level announcements.
Does not reflect that some school districts, such as Boston, delayed closure in order to continue providing food to their poorer students who depend on the school’s assistance via the free meal program.
State governor’s political parties from The Henry J. Kaiser Family Foundation
Data source: National Governors Association as of January 3, 2019
Sources of bias:
State population data (estimation for 2019) from the US Census Bureau
Data sources: Census Bureau estimations
Sources of bias:
Base estimations based on different data sources
Estimates are typically based on an estimated population consistent with the most recent decennial census and are produced using the cohort-component method
Unclear exactly how the Census Bureau makes these estimations. May have hidden biases. If the census is not actually reflective of the population the estimates are not either.
With the data outlined above, one can create visualizations that illustrate :
overall positive case rate in the US centered around school closure date
state-by-state timing of school closures, testing availability, positive test rate, increasing cases rate and death rate
Visualization Tasks
Overview via parallel coordinate plots that display states’ school closure date information, total tests performed, governor political party and region
Map that shows geographical relationships via map with ability to hover to reveal state characteristics (i.e. governor political party and state-mandated school closure)
Detailed line plot and heatmap displaying selected entities (listed below) centered around school closure date and showing incubation periods
50 states make for a lot of data points to display all at once. To give an overview in a way that someone can easily see a pattern in the data without it looking very busy is one of our challenges for our visualization.
We may find that there is small signal or no pattern in our visualization.
Target audience (policymakers) must find visualization user-friendly.
We are limited by Plotly’s abilities
We will first merge the COVID tracking data and the school closure data by state; all missing values (i.e., the number of tests/cases) will be empty on the visualization (no imputation will be conducted). The only calculation we need to compute is the rate of change in cases, which will be calculated when the visualization loads. The visualization will be implemented in R, specifically using Shiny and Plotly.
The Parallel Coordinates Plot will be implemented first with the ability to click on the category axis title to sort (and color) each state, including: governor political party, region, time of school closure (grouped as categorical). We implement the parallel coordinate (PC) plot with at least five different axes: states, governor party, region, the closure date, and the current number of cases. Whenever the user selects a category on the dropdown list, both the region colors on the map and the line colors on the PC plot will be updated accordingly.
The map will be implemented next. When hovering over the state, the total number of positive test results will appear.
By default, a line chart will be displayed next to the map showing the individual data and best fit line of the # of cases over time for each category (or the US as a whole). The line chart shows the number of cases by time centered by school closure date, such that the x-axis represents x number of days before and after school closure. A toggle button will be added, enabling the user to toggle between the line chart and a heatmap that indicates the case rate (i.e. % change in cases, or the derivative) over time.
Since we completed this process virtually, we adapted the design sheet process such that sheets 2,3 and 4 were done in parallel before meeting as a group to combine them for sheet 5. The design sheets are below in slide format.
Sheet 1 is zoomed in so that the font is not too small.
Sheet 1: Brain Storm (done collaboratively)
Sheet 2: Initial Design A (Jon)
Sheet 3: Initial Design B (Kathleen)
Sheet 4: Initial Design C (Chen)
Sheet 5: Realization Design (done collaboratively)
Chen
Jon
Kathleen
Dany
Overall layout of visualization
Side bar controls
Parallel Coordinate Plot - Linkage by selecting Democratic States. Demonstrates that democratic states had higher case loads (see PC plot, % positive tests) but no difference in closing times compared to Republican states.
Parallel Coordinate Plot - Comparing Early vs Late Closures. Suggests no significant difference in state characteristics among closure time.
Map - showing range of timing of school closures by color
Line graph of % change in new deaths, showing trend lines before and after school closure by political party. Suggests improved % change in new deaths after school closure in Democratic states (i.e. bending the curve) - however this could be because there was already a high case loads in those states (see above), as incubation period of 5 days is unlikely to reflect a change in deaths.
Heat map of % change in new positive cases by state, over time. Suggests improving % change in new cases after schools were closed (moving downward, below orange line on graphs).
Like all datasets reporting number of positives, this is somewhat reflective of total testing as well as true case load.
In the early stages of the pandemic, many interventions were implemented at once (school closings, travel plans, social distancing, stay-at-home orders), thus it is difficult to tease out which intervention has the greatest effect.
It is difficult to separate cause from effect.
There may be some confounding factors that have yet to be identified.
Considering an incubation period: there is a delay in any effect from a policy.
In the future, we would like to:
Extend data as schools re-open to look at effects of school openings
Can extend data to when schools re-open and look at effects of re-opening on case loads and rates
Visualize other policies in the same manner. Other policies include recommendations/mandates to wear a mask in public, group size limitations etc.
Define groups of school closure relative to occurrence of the first case or any other case threshold
Below we show these ideas on a design sheet:
Future Directions Sheet 1
The COVID Tracking Project From The Atlantic. Historical state data, downloaded from https://covidtracking.com/api, April 28, 2020.
Education Week. Coronavirus and School Closures: State Data, downloaded from https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html, April 7, 2020.
The Henry J. Kaiser Family Foundation. State Political Parties, downloaded from https://www.kff.org/other/state-indicator/state-political-parties/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D, April 26, 2020.
United States Census Bureau. State population data (estimation for 2019), downloaded from https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/, May 1, 2020.